The case study follows the six step data analysis process: - ask - prepare - process - analyze - share - act
Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. I’ve been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights I discover will then help guide marketing strategy for the company.
Business Task: Identifying the trends of smart device usage and making data-driven business strategies to determine the opportunities for growth.
Questions for the analysis: 1. What are some trends in smart device usage? 2. How could these trends apply to Bellabeat customers? 3. How could these trends help influence Bellabeat marketing strategy?
Primary stakeholders: Urška Sršen and Sando Mur, executive team members. Secondary stakeholders: Bellabeat marketing analytics team.
R Studio was used to complete this analysis because of the many packages and data visualization features available to explore the data with. ### Installing the packages
install.packages("plotly", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'plotly' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("dplyr", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'dplyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\svlas\OneDrive\Dokumente\R\win-
## library\4.1\00LOCK\dplyr\libs\x64\dplyr.dll to C:
## \Users\svlas\OneDrive\Dokumente\R\win-library\4.1\dplyr\libs\x64\dplyr.dll:
## Permission denied
## Warning: restored 'dplyr'
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("here", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'here' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("janitor", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'janitor' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("skimr", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'skimr' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("ggplot2", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'ggplot2' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("lubridate", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'lubridate' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'lubridate'
## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\svlas\OneDrive\Dokumente\R\win-
## library\4.1\00LOCK\lubridate\libs\x64\lubridate.dll
## to C:\Users\svlas\OneDrive\Dokumente\R\win-
## library\4.1\lubridate\libs\x64\lubridate.dll: Permission denied
## Warning: restored 'lubridate'
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
install.packages("RColorBrewer", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/svlas/OneDrive/Dokumente/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'RColorBrewer' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\svlas\AppData\Local\Temp\RtmpW09t0R\downloaded_packages
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.6 v dplyr 1.0.9
## v tidyr 1.2.0 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(here)
## here() starts at C:/sources/R_project
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(RColorBrewer)
dailyActivity <- read.csv("C:\\sources\\R_project\\dailyActivity.csv")
View(dailyActivity)
sleepDay_merged <- read.csv("C:\\sources\\R_project\\sleepDay_merged.csv")
View(sleepDay_merged)
weightLogInfo_merged <- read.csv("C:\\sources\\R_project\\weightLogInfo_merged.csv")
View(weightLogInfo_merged)
To be sure that the data frames were imported correctly, the head() function is used. The colnames() and glimpse() functions were used to determine what variables are included in the data frames and find commonalities
head(dailyActivity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
glimpse(dailyActivity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
colnames(dailyActivity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
head(sleepDay_merged)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
glimpse(sleepDay_merged)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
colnames(sleepDay_merged)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(weightLogInfo_merged)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
glimpse(weightLogInfo_merged)
## Rows: 67
## Columns: 8
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212~
## $ Date <chr> "5/2/2016 11:59:59 PM", "5/3/2016 11:59:59 PM", "4/13/2~
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, ~
## $ WeightPounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6~
## $ Fat <int> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,~
## $ IsManualReport <chr> "True", "True", "False", "True", "True", "True", "True"~
## $ LogId <dbl> 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,~
colnames(weightLogInfo_merged)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
As we can see, the data type of the Date column is character. We need to convert the data type to date time in all of three data frames.
dailyActivity$ActivityDateNew <- strptime(dailyActivity$ActivityDate, '%m/%d/%Y')
class(dailyActivity$ActivityDateNew)
## [1] "POSIXlt" "POSIXt"
sleepDay_merged$SleepDayNew <- strptime(sleepDay_merged$SleepDay, '%m/%d/%Y %I:%M:%S %p')
class(sleepDay_merged$SleepDayNew)
## [1] "POSIXlt" "POSIXt"
weightLogInfo_merged$DateNew <- strptime(weightLogInfo_merged$Date, '%m/%d/%Y %I:%M:%S %p')
class(weightLogInfo_merged$DateNew)
## [1] "POSIXlt" "POSIXt"
dailyActivity$day_of_week <- format(as.Date(dailyActivity$ActivityDateNew), "%A")
dailyActivity$day_of_week <- ordered(dailyActivity$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
We use the n_distinct() function to determine the number of unique values.
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleepDay_merged$Id)
## [1] 24
n_distinct(weightLogInfo_merged$Id)
## [1] 8
As we can see, the dailyActivity database has 33 unique users, at the same time weightLogInfo database has only 8 unique users, and sleepDay database - 24. That means that reliable insights and recommendations cannot be made from these data frames.
To get key statistics about the data frames we use the summary() function.
dailyActivity%>%
select(TotalSteps,TotalDistance,VeryActiveMinutes,FairlyActiveMinutes,LightlyActiveMinutes,SedentaryMinutes,Calories)%>%
summary()
## TotalSteps TotalDistance VeryActiveMinutes FairlyActiveMinutes
## Min. : 0 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7406 Median : 5.245 Median : 4.00 Median : 6.00
## Mean : 7638 Mean : 5.490 Mean : 21.16 Mean : 13.56
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.: 32.00 3rd Qu.: 19.00
## Max. :36019 Max. :28.030 Max. :210.00 Max. :143.00
## LightlyActiveMinutes SedentaryMinutes Calories
## Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.:127.0 1st Qu.: 729.8 1st Qu.:1828
## Median :199.0 Median :1057.5 Median :2134
## Mean :192.8 Mean : 991.2 Mean :2304
## 3rd Qu.:264.0 3rd Qu.:1229.5 3rd Qu.:2793
## Max. :518.0 Max. :1440.0 Max. :4900
As we can see, the average user takes 7638 steps per day, which is not enough because a reasonable target for healthy adults is 10 000 steps. According to recommendations of the World Health Organization Adults should do at least 75-150 minutes of intensity aerobic physical activity. The average Fitbit user gets 21.16 minutes of very active activity a day, which means 148.12 minutes a week. That means that the fitbit users make enough activity for substantial health benefits. However, the average amount of sedentary is 991.2 minutes, which equals 16.52 hours a day. Adults should start by doing small amounts of physical activity, and gradually increase the frequency, intensity and duration over time. Besides, the average calorie burn per day reaches 2304 calories. The average person burns around 1800 calories a day doing absolutely nothing. However, we can not make any conclusions, because we know nothing about the gender, or age of Fitbit users.
sleepDay_merged%>%
select(TotalMinutesAsleep,TotalTimeInBed)%>%
summary()
## TotalMinutesAsleep TotalTimeInBed
## Min. : 58.0 Min. : 61.0
## 1st Qu.:361.0 1st Qu.:403.0
## Median :433.0 Median :463.0
## Mean :419.5 Mean :458.6
## 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :796.0 Max. :961.0
According to the data, the average Fitbit user sleeps 419.5 minutes a day, which equals 6.99 hours a day. National Sleep Foundation guidelines advise that healthy adults need between 7 and 9 hours of sleep per night.
weightLogInfo_merged%>%
select(WeightKg,BMI)%>%
summary()
## WeightKg BMI
## Min. : 52.60 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:23.96
## Median : 62.50 Median :24.39
## Mean : 72.04 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:25.56
## Max. :133.50 Max. :47.54
As we already know, this data frame has a low number of participants, the average BMI is 25.19. This is considered an overweight BMI. However, BMI can be a screening tool and does not diagnose the body fatness or health of an individual.
Recommendations to business
First of all this data in this study only included the information from 33 people over the course of about a month. That means that there is not enough data to provide deep analysis and we need to do more research. Based on the analysis, we can determine that sedentaries make up a significant portion, 81% of users daily active minutes. Users spend on average 12 hours a day in sedentary minutes, 4 hours lightly active, and only half-hour in fairly+very active. To make their users more active, Bellabeat can include a function in the Bellabeat app to alert users who tend to have a high number to sedentary minutes. Also we see that on Saturday users take more steps, burn more calories, and spend less time sedentary. It would be useful if Bellabeat could provide users with some training programs or challenges to have short active exercises during the week.